E-Commerce - Recommender Systems Assignment

Sbmitted by:

Itay Bouganim - ID: 305278384

Sahar Vaya - ID: 205583453

Download dataset

Explore dataset

Observe at the top 20 mean ratings per movie

Question 1 - MovieLens 100K dataset analysis

Section 1.a.

Section 1.b.

Male average movie ratings

Female average movie ratings

Male-Female average movie ratings difference

Include only movies that are above the average amount of raters according to male and female raters.

Section 1.c.

Section 1.d.

Movie Popularity calculation

The formula as used in IMDB for calculating the Top Rated movies gives a true Bayesian estimate:

weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C

where:

Section 1.e.

MovieLens 100K dataset Sparsity

Observe at the top rating users and their rating count

Mean ratings per user in the dataset

Calculate the mean ratings per user in the entire dataset

Explore how the top rating users rated the top rated movies

Question 2 - Non-Personal recommendations

Precision/Recall @ K calculations

Precision at k is the proportion of recommended items in the top-k set that are relevant

Recall at k is the proportion of relevant items found in the top-k recommendations

Here we assume that a relevant movie is a movie with rating value of 4 or above (actual and predicted).

Load training data and test data to train the models in the following questions on

Compare the rating distribution in our training data and our test data

Compare the user gender distribution in our training data and our test data

Compare the user age distribution in our training data and our test data

Section 2.a.

Baseline rating prediction models based on the global average rating per movie in the entire MovieLens 100K dataset

Entire population baseline predictions

Section 2.b.

Male population baseline predictions

Female population baseline predictions

Compare the baseline models loss values in regards to population gender

Compare the baseline models precision/recall @ K values in regards to population gender

Question 3 - Personal recommendations using TuriCreate library

The TuriCreate recommender toolkit provides a unified interface to train a variety of recommender models and use them to make recommendations.

Section 3.a.

Implement rating prediction models according to the following models:

Item2Item Simillarity

Cosine similarity

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space. It is defined to equal the cosine of the angle between them, which is also the same as the inner product of the same vectors normalized to both have length 1.

cosine.png

Pearson Correlation

In statistics, the Pearson correlation coefficient, also referred to as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC), or the bivariate correlation, is a measure of linear correlation between two sets of data.

pearson.png

Item Content

A recommender based on the similarity between item content rather using user interaction patterns to compute similarity.

Matrix Facorization

Matrix factorization is a class of collaborative filtering algorithms used in recommender systems. Matrix factorization algorithms work by decomposing the user-item interaction matrix into the product of two lower dimensionality rectangular matrices.

mf.png

Section 3.b.

Section 3.c.

Question 4 - Keras ‫‪Neural‬‬ ‫‪Collaborative‬‬ ‫‪Filtering‬‬ models

NCF: A fusion of GMF and MLP

NCF has 2 components GMF (Generalized Matrix Factorisation )* and MLP(Multi-Layer Perceptron) with the following benefits GMF that applies the linear kernel to model user-item interactions like MF.

MLP that uses multiple neural layers to layer nonlinear interactions NCF combines these models together to superimpose their desirable characteristics. NCF concatenates the output of GMF and MLP before feeding them into NCF layer.

ncf.png

Prepare data for ‫‪Neural‬‬ ‫‪Collaborative‬‬ ‫‪Filtering‬‬ model training

Assign user ids and movie ids to training data map

Section 4.a.

We will implement an NCF (Neural Collaborative Filtering) model starting with 1 hidden dense layer.

Section 4.b.

We can clearly see that our initial Neural Collaborative Filtering model suffers from overfitting.

(validation loss increasing while training loss decreasing)

In our second iteration we will try to solve it by:

We can already see an improvement, our model suffers less from overfitting.

We will now try and increase the amount of hidden layers from 1 to 3 (20 neuron wide) with dropout separating layers in our model to increase the learning capabilities of our NCF MLP side of the model.

We will try to make to model 'wider' by increasing the amount of neurons from 20 to 64 in each of the 3 hidden layers in the MLP side of the model.

Section 4.c.

Question 5 - Adding Extra User and Movie data to our models (DeepFM)

DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture.

Compared to the latest Wide \& Deep model from Google, DeepFM has a shared input to its "wide" and "deep" parts, with no need of feature engineering besides raw features.

DeepFM consists of an FM component and a deep component which are integrated in a parallel structure. The FM component is the same as the 2-way factorization machines which is used to model the low-order feature interactions. The deep component is a multi-layered perceptron that is used to capture high-order feature interactions and nonlinearities. These two components share the same inputs/embeddings and their outputs are summed up as the final prediction.

The advantages of DeepFM over the Wide & Deep model is that it reduces the effort of hand-crafted feature engineering by identifying feature combinations automatically.

Wide-deep-architecture-of-DeepFM-The-wide-and-deep-component-share-the-same-input-raw.png

Section 5.a.

Preprocessing

In order to feed the model with additional features we need to first preprocess them. We will use the following features from each user:

We will use the following features in regards to each movie:

Extract the movie genres from binary column into a list of genres

Convert the genres for each movie to a sequence representing the genres for each movie.

Add users gender and age features to our training and testing data. (With conversion of gender 'M' and 'F' to 1 and 0)

Section 5.b.

DeepFM model architecture

fst.png

sec.png

1st order factorization machines (summation of all 1st order embed layers)

2nd order factorization machines (summation of dot product between 2nd order embed layers)

Deep part (DNN model on shared embed layers)

1st order factorization machines

1st order will require features to map to a scalar.

2nd order FM, each feature stack to concat_embed_2d layer with shape (None, p, k).

k – matrix factorization latent dimension, p is feature dimension.

the calculation of interaction terms can be simplified, using:

calc.png

Deep Part (DNN)

1st order factorization machines plot

2st order factorization machines plot

DNN part plot

DeepFM model full plot

DeepCTR DeepFM model

Extra user features:

Extra Movie features:

Section 5.c.

We will try diffrent combination of user and movie features to see what works best.

Use DeepCTR DeepFM model and ignore users age

Ignore movie categorical genre data

Use only movie genre data

Section 5.d.

DeepFM models comparison

Section 5.e.

Overall models comparison

We can see that overall the more complicated models we used have no substantial differences in terms of loss MAE/RMSE and in terms of Percision/Recall @ K, therefore, and due to the persitency over multiple runs we would recommend to use the DeepCTR based DeepFM model by using the additional features of user Gender and Movie Genres in order to predict rating for the MovieLens 100K dataset.